Technical Report: Estimating Reliability of Workers for Cooperative Distributed Computing
نویسندگان
چکیده
Internet supercomputing is an approach to solving partitionable, computation-intensive problems by harnessing the power of a vast number of interconnected computers. For the problem of using network supercomputing to perform a large collection of independent tasks, prior work introduced a decentralized approach and provided randomized synchronous algorithms that perform all tasks correctly with high probability, while dealing with misbehaving or crash-prone processors. The main weaknesses of existing algorithms is that they assume either that the average probability of a non-crashed processor returning incorrect results is inferior to 1 2 , or that the probability of returning incorrect results is known to each processor. Here we present a randomized synchronous distributed algorithm that tightly estimates the probability of each processor returning correct results. Starting with the set P of n processors, let F be the set of processors that crash. Our algorithm estimates the probability pi of returning a correct result for each processor i ∈ P − F , making the estimates available to all these processors. The estimation is based on the (ε, δ)-approximation, where each estimated probability p̃i of pi obeys the bound Pr[pi(1 − ε) ≤ p̃i ≤ pi(1 + ε)] > 1 − δ, for any constants δ > 0 and ε > 0 chosen by the user. An important aspect of this algorithm is that each processor terminates without global coordination. We assess the efficiency of the algorithm in three adversarial models as follows. For the model where the number of non-crashed processors |P −F | is linearly bounded the time complexity T (n) of the algorithm is Θ(log n), work complexity W (n) is Θ(n log n), and message complexity M(n) is Θ(n log n). For the model where |P−F | is bounded by a fractional polynomial (|P − F | = Ω(n), for a constant a ∈ (0, 1)) we have T (n) = O(n1−a log n log log n), W (n) = O(n log n log log n), and M(n) = O(n log n). For the model where |P − F | is bounded by a poly-logarithm we have T (n) = O(n), W (n) = O(n), and M(n) = O(n). All bounds are shown to hold with high probability.
منابع مشابه
Estimating Reliability in Mobile ad-hoc Networks Based on Monte Carlo Simulation (TECHNICAL NOTE)
Each system has its own definition of reliability. Reliability in mobile ad-hoc networks (MANET) could be interpreted as, the probability of reaching a message from a source node to destination, successfully. The variability and volatility of the MANET configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate. It is because, no single structure or configurat...
متن کاملImproving the Reliability of Cooperative Concurrent Systems with Exception Flow Analysis
Developers of fault-tolerant distributed systems must guarantee that the fault tolerance mechanisms they build are, themselves, reliable. Otherwise, these mechanisms might end up contributing negatively to overall system dependability, thus defeating the purpose of introducing fault tolerance into the system. To achieve the desired levels of reliability, the development of mechanisms for detect...
متن کاملEntropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملTECHNICAL REPORT 94{14 A Framework for Customizing Coherence Protocols of Distrbuted File Caches in Lucas File System
In cooperative applications such as group CAD and group software development systems, multiple processes communicate with each other by sharing complex data consisting of nested structures and pointers. Although the sharing of complex data structures in the distributed environment is achieved through the technology of distributed shared memory, a single cache coherence protocol cannot e ciently...
متن کاملA novel cooperative game between client and subcontractors based on technical characteristics
Large projects often have several activities which are performed by some subcontractors with several skills. Costs and time reduction and quality improvement of the project are very important for client and subcontractors. Therefore, in real large projects, subcontractors join together and form coalitions for improving the project profit. A key question is how an extra profit of cooperation amo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1407.0696 شماره
صفحات -
تاریخ انتشار 2014